multi-party conversation
Addressing Antisocial Behavior in Multi-Party Dialogs Through Multimodal Representation Learning
Bakarou, Hajar, Messoussi, Mohamed Sinane El, Ollagnier, Anaïs
Antisocial behavior (ASB) on social media -- including hate speech, harassment, and cyberbullying -- poses growing risks to platform safety and societal well-being. Prior research has focused largely on networks such as X and Reddit, while \textit{multi-party conversational settings} remain underexplored due to limited data. To address this gap, we use \textit{CyberAgressionAdo-Large}, a French open-access dataset simulating ASB in multi-party conversations, and evaluate three tasks: \textit{abuse detection}, \textit{bullying behavior analysis}, and \textit{bullying peer-group identification}. We benchmark six text-based and eight graph-based \textit{representation-learning methods}, analyzing lexical cues, interactional dynamics, and their multimodal fusion. Results show that multimodal models outperform unimodal baselines. The late fusion model \texttt{mBERT + WD-SGCN} achieves the best overall results, with top performance on abuse detection (0.718) and competitive scores on peer-group identification (0.286) and bullying analysis (0.606). Error analysis highlights its effectiveness in handling nuanced ASB phenomena such as implicit aggression, role transitions, and context-dependent hostility.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- (9 more...)
Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs
Gody, Reem, Abdelghaffar, Mohamed, Jabreel, Mohammed, Tawfik, Ahmed
Large language models (LLMs) have showcased remarkable capabilities in conversational AI, enabling open-domain responses in chat-bots, as well as advanced processing of conversations like summarization, intent classification, and insights generation. However, these models are resource-intensive, demanding substantial memory and computational power. To address this, we propose a cost-effective solution that filters conversational snippets of interest for LLM processing, tailored to the target downstream application, rather than processing every snippet. In this work, we introduce an innovative approach that leverages knowledge distillation from LLMs to develop an intent-based filter for multi-party conversations, optimized for compute power constrained environments. Our method combines different strategies to create a diverse multi-party conversational dataset, that is annotated with the target intents and is then used to fine-tune the MobileBERT model for multi-label intent classification. This model achieves a balance between efficiency and performance, effectively filtering conversation snippets based on their intents. By passing only the relevant snippets to the LLM for further processing, our approach significantly reduces overall operational costs depending on the intents and the data distribution as demonstrated in our experiments.
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
- Leisure & Entertainment (0.94)
- Media (0.68)
SS-MPC: A Sequence-Structured Multi-Party Conversation System
Jang, Yoonjin, Kim, Keunha, Ko, Youngjoong
Recent Multi-Party Conversation (MPC) models typically rely on graph-based approaches to capture dialogue structures. However, these methods have limitations, such as information loss during the projection of utterances into structural embeddings and constraints in leveraging pre-trained language models directly. In this paper, we propose \textbf{SS-MPC}, a response generation model for MPC that eliminates the need for explicit graph structures. Unlike existing models that depend on graphs to analyze conversation structures, SS-MPC internally encodes the dialogue structure as a sequential input, enabling direct utilization of pre-trained language models. Experimental results show that \textbf{SS-MPC} achieves \textbf{15.60\% BLEU-1} and \textbf{12.44\% ROUGE-L} score, outperforming the current state-of-the-art MPC response generation model by \textbf{3.91\%p} in \textbf{BLEU-1} and \textbf{0.62\%p} in \textbf{ROUGE-L}. Additionally, human evaluation confirms that SS-MPC generates more fluent and accurate responses compared to existing MPC models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas (0.14)
- North America > Canada (0.14)
- (4 more...)
Don't Stop the Multi-Party! On Generating Synthetic Multi-Party Conversations with Constraints
Penzo, Nicolò, Guerini, Marco, Lepri, Bruno, Glavaš, Goran, Tonelli, Sara
Multi-Party Conversations (MPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties. For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., as a consequence of ``reply-to'' links). This work explores the feasibility of generating diverse MPCs with instruction-tuned Large Language Models (LLMs) by providing deterministic constraints such as dialogue structure and participants' stance. We investigate two complementary strategies of leveraging LLMs in this context: (i.) LLMs as MPC generators, where we task the LLM to generate a whole MPC at once and (ii.) LLMs as MPC parties, where the LLM generates one turn of the conversation at a time, provided the conversation history. We next introduce an analytical framework to evaluate compliance with the constraints, content quality, and interaction complexity for both strategies. Finally, we assess the quality of obtained MPCs via human annotation and LLM-as-a-judge evaluations. We find stark differences among LLMs, with only some being able to generate high-quality MPCs. We also find that turn-by-turn generation yields better conformance to constraints and higher linguistic variability than generating MPCs in one pass. Nonetheless, our structural and qualitative evaluation indicates that both generation strategies can yield high-quality MPCs.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- (17 more...)
- Information Technology > Security & Privacy (1.00)
- Energy (1.00)
Advancing Multi-Party Dialogue Systems with Speaker-ware Contrastive Learning
Hu, Zhongtian, He, Qi, Li, Ronghan, Zhao, Meng, Wang, Lifang
Dialogue response generation has made significant progress, but most research has focused on dyadic dialogue. In contrast, multi-party dialogues involve more participants, each potentially discussing different topics, making the task more complex. Current methods often rely on graph neural networks to model dialogue context, which helps capture the structural dynamics of multi-party conversations. However, these methods are heavily dependent on intricate graph structures and dataset annotations, and they often overlook the distinct speaking styles of participants. To address these challenges, we propose CMR, a Contrastive learning-based Multi-party dialogue Response generation model. CMR uses self-supervised contrastive learning to better distinguish "who says what." Additionally, by comparing speakers within the same conversation, the model captures differences in speaking styles and thematic transitions. To the best of our knowledge, this is the first approach to apply contrastive learning in multi-party dialogue generation. Experimental results show that CMR significantly outperforms state-of-the-art models in multi-party dialogue response tasks.
Fast Multi-Party Open-Ended Conversation with a Social Robot
Abbo, Giulio Antonio, Pinto-Bernal, Maria Jose, Catrycke, Martijn, Belpaeme, Tony
This paper presents the implementation and evaluation of a conversational agent designed for multi-party open-ended interactions. Leveraging state-of-the-art technologies such as voice direction of arrival, voice recognition, face tracking, and large language models, the system aims to facilitate natural and intuitive human-robot conversations. Deployed on the Furhat robot, the system was tested with 30 participants engaging in open-ended group conversations and then in two overlapping discussions. Quantitative metrics, such as latencies and recognition accuracy, along with qualitative measures from user questionnaires, were collected to assess performance. The results highlight the system's effectiveness in managing multi-party interactions, though improvements are needed in response relevance and latency. This study contributes valuable insights for advancing human-robot interaction, particularly in enhancing the naturalness and engagement in group conversations.
- South America > Peru (0.05)
- North America > United States > Hawaii (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
- (3 more...)
- Research Report (1.00)
- Overview (0.88)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Proactive Conversational Agents with Inner Thoughts
Liu, Xingyu Bruce, Fang, Shitao, Shi, Weiyan, Wu, Chien-Sheng, Igarashi, Takeo, Chen, Xiang `Anthony'
One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
- North America > United States > California > Los Angeles County > Los Angeles (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- (8 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (1.00)
- Health & Medicine > Consumer Health (0.69)
- Information Technology > Software (0.48)
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
Wang, Yueqian, Meng, Xiaojun, Wang, Yuxuan, Liang, Jianxin, Liu, Qun, Zhao, Dongyan
Multi-modal multi-party conversation (MMC) is a less studied yet important topic of research due to that it well fits real-world scenarios and thus potentially has more widely-used applications. Compared with the traditional multi-modal conversations, MMC requires stronger character-centered understanding abilities as there are many interlocutors appearing in both the visual and textual context. To facilitate the study of this problem, we present Friends-MMC in this paper, an MMC dataset that contains 24,000+ unique utterances paired with video context. To explore the character-centered understanding of the dialogue, we also annotate the speaker of each utterance, the names and bounding bboxes of faces that appear in the video. Based on this Friends-MMC dataset, we further study two fundamental MMC tasks: conversation speaker identification and conversation response prediction, both of which have the multi-party nature with the video or image as visual context. For conversation speaker identification, we demonstrate the inefficiencies of existing methods such as pre-trained models, and propose a simple yet effective baseline method that leverages an optimization solver to utilize the context of two modalities to achieve better performance. For conversation response prediction, we fine-tune generative dialogue models on Friend-MMC, and analyze the benefits of speaker information. The code and dataset is publicly available at https://github.com/yellow-binary-tree/Friends-MMC and thus we call for more attention on modeling speaker information when understanding conversations.
A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation
Bečková, Iveta, Pócoš, Štefan, Belgiovine, Giulia, Matarese, Marco, Sciutti, Alessandra, Mazzola, Carlo
The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition in multi-party conversation scenarios. Specifically, in the field of human-robot interaction, it becomes even more crucial to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, restricting the robot's capability to estimate whether it was addressed and limiting its interactive skills. For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, to provide explanations for their decisions besides excellent performance. In our work, we a) present an addressee estimation model with improved performance in comparison with the previous SOTA; b) further modify this model to include inherently explainable attention-based segments; c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; d) propose several ways to incorporate explainability and transparency in the aforementioned architecture; and e) perform a pilot user study to analyze the effect of various explanations on how human participants perceive the robot.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Slovakia > Bratislava > Bratislava (0.04)
- Europe > Italy (0.04)
- (2 more...)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.30)
A Study on Social Robot Behavior in Group Conversation
Nguyen, Tung, Nichols, Eric, Gomez, Randy
Recently, research in human-robot interaction began to consider a robot's influence at the group level. Despite the recent growth in research investigating the effects of robots within groups of people, our overall understanding of what happens when robots are placed within groups or teams of people is still limited. This paper investigates several key problems for social robots that manage conversations in a group setting, where the number of participants is more than two. In a group setting, the conversation dynamics are a lot more complicated than the conventional one-to-one conversation, thus, there are more challenges need to be solved.